TensorRT Memory + Runtime Report

Engine: engines/gpt2/rank0.engine

I/O Buffer Allocations (cudaMalloc tracked)

Note: This section tracks only explicit I/O buffers allocated by this program, not TensorRT internal workspace/scratch allocations.

TensorSize (MB)Approx. 64KB PagesApprox. Fragmentation (KB)
present.21.key2.000320.000
present.21.value2.000320.000
present.22.key2.000320.000
present.22.value2.000320.000
present.23.key2.000320.000
present.23.value2.000320.000
present.13.key2.000320.000
present.13.value2.000320.000
present.14.key2.000320.000
present.14.value2.000320.000
present.15.key2.000320.000
present.15.value2.000320.000
present.16.key2.000320.000
present.16.value2.000320.000
present.17.key2.000320.000
present.17.value2.000320.000
present.18.key2.000320.000
present.18.value2.000320.000
present.19.key2.000320.000
present.19.value2.000320.000
present.20.key2.000320.000
present.20.value2.000320.000
present.5.key2.000320.000
present.5.value2.000320.000
present.6.key2.000320.000
present.6.value2.000320.000
present.7.key2.000320.000
present.7.value2.000320.000
present.8.key2.000320.000
present.8.value2.000320.000
present.9.key2.000320.000
present.9.value2.000320.000
present.10.key2.000320.000
present.10.value2.000320.000
present.11.key2.000320.000
present.11.value2.000320.000
present.12.key2.000320.000
present.12.value2.000320.000
past_key_values.21.key1.996324.000
past_key_values.21.value1.996324.000
past_key_values.22.key1.996324.000
past_key_values.22.value1.996324.000
past_key_values.23.key1.996324.000
past_key_values.23.value1.996324.000
present.0.key2.000320.000
present.0.value2.000320.000
present.1.key2.000320.000
present.1.value2.000320.000
present.2.key2.000320.000
present.2.value2.000320.000
present.3.key2.000320.000
present.3.value2.000320.000
present.4.key2.000320.000
present.4.value2.000320.000
past_key_values.13.key1.996324.000
past_key_values.13.value1.996324.000
past_key_values.14.key1.996324.000
past_key_values.14.value1.996324.000
past_key_values.15.key1.996324.000
past_key_values.15.value1.996324.000
past_key_values.16.key1.996324.000
past_key_values.16.value1.996324.000
past_key_values.17.key1.996324.000
past_key_values.17.value1.996324.000
past_key_values.18.key1.996324.000
past_key_values.18.value1.996324.000
past_key_values.19.key1.996324.000
past_key_values.19.value1.996324.000
past_key_values.20.key1.996324.000
past_key_values.20.value1.996324.000
past_key_values.5.key1.996324.000
past_key_values.5.value1.996324.000
past_key_values.6.key1.996324.000
past_key_values.6.value1.996324.000
past_key_values.7.key1.996324.000
past_key_values.7.value1.996324.000
past_key_values.8.key1.996324.000
past_key_values.8.value1.996324.000
past_key_values.9.key1.996324.000
past_key_values.9.value1.996324.000
past_key_values.10.key1.996324.000
past_key_values.10.value1.996324.000
past_key_values.11.key1.996324.000
past_key_values.11.value1.996324.000
past_key_values.12.key1.996324.000
past_key_values.12.value1.996324.000
input_ids0.000163.996
logits0.192459.684
past_key_values.0.key1.996324.000
attention_mask0.002162.000
past_key_values.0.value1.996324.000
past_key_values.1.key1.996324.000
past_key_values.1.value1.996324.000
past_key_values.2.key1.996324.000
past_key_values.2.value1.996324.000
past_key_values.3.key1.996324.000
past_key_values.3.value1.996324.000
past_key_values.4.key1.996324.000
past_key_values.4.value1.996324.000

Total I/O Bytes: 192.006 MB

Total I/O Pages (approx, 64KB): 3078 (~192.375 MB)

Internal Fragmentation (approx): 377.680 KB

Approximate Address-Space Page Map (64KB blocks)

This is a visualization of the virtual address range spanned by your I/O allocations, bucketed into 64KB blocks. It is an approximation, not a true GPU residency map.

VRAM Used (sampled 1 Hz via nvidia-smi)

Total VRAM: 8192 MB

Each block = 1 second. Height/color = utilization.

Token Latency / Throughput

Latency (ms): min 2.883, avg 3.216, max 33.485

Avg throughput: 310.94 tok/s

Each block = 1 token. Height/color = latency.

Files